Lake County
Water levels across the Great Lakes are falling – just as US data centers move in
Tue 16 Dec 2025 08.00 ESTLast modified on Tue 16 Dec 2025 08.02 EST The sign outside Tom Hermes's farmyard in Perkins Township in Ohio, a short drive south of the shores of Lake Erie, proudly claims that his family have farmed the land here since 1900. Today, he raises 130 head of cattle and grows corn, wheat, grass and soybeans on 1,200 acres of land. For his family, his animals and wider business, water is life. So when, in May 2024, the Texas-based Aligned Data Centers broke ground on its NEO-01, four-building, 200,000 sq ft data center on a brownfield site that abuts farmland that Hermes rents, he was concerned. "We have city water here. That's going to reduce the pressure if they are sucking all the water," he says of the data center.
- North America > United States > Texas (0.25)
- Europe > Ukraine (0.06)
- North America > United States > Wisconsin (0.05)
- (6 more...)
Structural Properties, Cycloid Trajectories and Non-Asymptotic Guarantees of EM Algorithm for Mixed Linear Regression
Luo, Zhankun, Hashemi, Abolfazl
This work investigates the structural properties, cycloid trajectories, and non-asymptotic convergence guarantees of the Expectation-Maximization (EM) algorithm for two-component Mixed Linear Regression (2MLR) with unknown mixing weights and regression parameters. Recent studies have established global convergence for 2MLR with known balanced weights and super-linear convergence in noiseless and high signal-to-noise ratio (SNR) regimes. However, the theoretical behavior of EM in the fully unknown setting remains unclear, with its trajectory and convergence order not yet fully characterized. We derive explicit EM update expressions for 2MLR with unknown mixing weights and regression parameters across all SNR regimes and analyze their structural properties and cycloid trajectories. In the noiseless case, we prove that the trajectory of the regression parameters in EM iterations traces a cycloid by establishing a recurrence relation for the sub-optimality angle, while in high SNR regimes we quantify its discrepancy from the cycloid trajectory. The trajectory-based analysis reveals the order of convergence: linear when the EM estimate is nearly orthogonal to the ground truth, and quadratic when the angle between the estimate and ground truth is small at the population level. Our analysis establishes non-asymptotic guarantees by sharpening bounds on statistical errors between finite-sample and population EM updates, relating EM's statistical accuracy to the sub-optimality angle, and proving convergence with arbitrary initialization at the finite-sample level. This work provides a novel trajectory-based framework for analyzing EM in Mixed Linear Regression.
- Europe > Austria > Vienna (0.14)
- Asia > Middle East > Jordan (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (5 more...)
DecEx-RAG: Boosting Agentic Retrieval-Augmented Generation with Decision and Execution Optimization via Process Supervision
Leng, Yongqi, Lei, Yikun, Liu, Xikai, Zhong, Meizhi, Xiong, Bojian, Zhang, Yurong, Gao, Yan, Wu, Yi, Hu, Yao, Xiong, Deyi
Agentic Retrieval-Augmented Generation (Agentic RAG) enhances the processing capability for complex tasks through dynamic retrieval and adaptive workflows. Recent advances (e.g., Search-R1) have shown that outcome-supervised reinforcement learning demonstrate strong performance. However, this approach still suffers from inefficient exploration, sparse reward signals, and ambiguous global reward feedback. To address these challenges, we propose DecEx-RAG, which models RAG as a Markov Decision Process (MDP) incorporating decision-making and execution, while introducing an efficient pruning strategy to optimize data expansion. Through comprehensive process-level policy optimization, DecEx-RAG significantly enhances the autonomous task decomposition, dynamic retrieval, and high-quality answer generation capabilities of large language models (LLMs). Experiments show that DecEx-RAG achieves an average absolute performance improvement of $6.2\%$ across six datasets, significantly outperforming existing baselines. Moreover, the pruning strategy improves data construction efficiency by nearly $6 \times$, providing an efficient solution for process-supervised RAG training. The code is available at https://github.com/sdsxdxl/DecEx-RAG.
- Europe > Austria > Vienna (0.14)
- North America > United States > Missouri > Jackson County > Kansas City (0.14)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- (17 more...)
- Leisure & Entertainment > Sports (1.00)
- Media (0.68)
- Law Enforcement & Public Safety (0.68)
Fine-Scale Soil Mapping in Alaska with Multimodal Machine Learning
Lin, Yijun, Chen, Theresa, Brungard, Colby, Sabine, Grunwald, Ives, Sue, Macander, Matt, Nawrocki, Timm, Chiang, Yao-Yi, Jelinski, Nic
Fine-scale soil mapping in Alaska, traditionally relying on fieldwork and localized simulations, remains a critical yet underdeveloped task, despite the region's ecological importance and extensive permafrost coverage. As permafrost thaw accelerates due to climate change, it threatens infrastructure stability and key ecosystem services, such as soil carbon storage. High-resolution soil maps are essential for characterizing permafrost distribution, identifying vulnerable areas, and informing adaptation strategies. We present MISO, a vision-based machine learning (ML) model to produce statewide fine-scale soil maps for near-surface permafrost and soil taxonomy. The model integrates a geospatial foundation model for visual feature extraction, implicit neural representations for continuous spatial prediction, and contrastive learning for multimodal alignment and geo-location awareness. We compare MISO with Random Forest (RF), a traditional ML model that has been widely used in soil mapping applications. Spatial cross-validation and regional analysis across Permafrost Zones and Major Land Resource Areas (MLRAs) show that MISO generalizes better to remote, unseen locations and achieves higher recall than RF, which is critical for monitoring permafrost thaw and related environmental processes. These findings demonstrate the potential of advanced ML approaches for fine-scale soil mapping and provide practical guidance for future soil sampling and infrastructure planning in permafrost-affected landscapes. The project will be released at https://github.com/knowledge-computing/Peatland-permafrost.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)
- North America > United States > Florida > Alachua County > Gainesville (0.14)
- North America > United States > Alaska > Fairbanks North Star Borough > Fairbanks (0.14)
- (10 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.93)
ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMs
Wang, Xiyao, Yang, Zhengyuan, Feng, Chao, Liang, Yongyuan, Zhou, Yuhang, Liu, Xiaoyu, Zang, Ziyi, Li, Ming, Lin, Chung-Ching, Lin, Kevin, Li, Linjie, Huang, Furong, Wang, Lijuan
Reinforcement learning (RL) has shown great effectiveness for fine-tuning large language models (LLMs) using tasks that are challenging yet easily verifiable, such as math reasoning or code generation. However, extending this success to visual perception in vision-language models (VLMs) has been impeded by the scarcity of vision-centric tasks that are simultaneously challenging and unambiguously verifiable. To this end, we introduce ViCrit (Visual Caption Hallucination Critic), an RL proxy task that trains VLMs to localize a subtle, synthetic visual hallucination injected into paragraphs of human-written image captions. Starting from a 200-word captions, we inject a single, subtle visual description error-altering a few words on objects, attributes, counts, or spatial relations-and task the model to pinpoint the corrupted span given the image and the modified caption. This formulation preserves the full perceptual difficulty while providing a binary, exact-match reward that is easy to compute and unambiguous. Models trained with the ViCrit Task exhibit substantial gains across a variety of VL benchmarks. Crucially, the improvements transfer beyond natural-image training data to abstract image reasoning and visual math, showing promises of learning to perceive rather than barely memorizing seen objects. To facilitate evaluation, we further introduce ViCrit-Bench, a category-balanced diagnostic benchmark that systematically probes perception errors across diverse image domains and error types. Together, our results demonstrate that fine-grained hallucination criticism is an effective and generalizable objective for enhancing visual perception in VLMs.
- North America > United States > Michigan (0.04)
- North America > United States > Maryland > Prince George's County > College Park (0.04)
- North America > United States > Indiana > Lake County > Gary (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
Comparative analysis of privacy-preserving open-source LLMs regarding extraction of diagnostic information from clinical CMR imaging reports
Amirrajab, Sina, Vehof, Volker, Bietenbeck, Michael, Yilmaz, Ali
Purpose: We investigated the utilization of privacy-preserving, locally-deployed, open-source Large Language Models (LLMs) to extract diagnostic information from free-text cardiovascular magnetic resonance (CMR) reports. Materials and Methods: We evaluated nine open-source LLMs on their ability to identify diagnoses and classify patients into various cardiac diagnostic categories based on descriptive findings in 109 clinical CMR reports. Performance was quantified using standard classification metrics including accuracy, precision, recall, and F1 score. We also employed confusion matrices to examine patterns of misclassification across models. Results: Most open-source LLMs demonstrated exceptional performance in classifying reports into different diagnostic categories. Google's Gemma2 model achieved the highest average F1 score of 0.98, followed by Qwen2.5:32B and DeepseekR1-32B with F1 scores of 0.96 and 0.95, respectively. All other evaluated models attained average scores above 0.93, with Mistral and DeepseekR1-7B being the only exceptions. The top four LLMs outperformed our board-certified cardiologist (F1 score of 0.94) across all evaluation metrics in analyzing CMR reports. Conclusion: Our findings demonstrate the feasibility of implementing open-source, privacy-preserving LLMs in clinical settings for automated analysis of imaging reports, enabling accurate, fast and resource-efficient diagnostic categorization.
- Europe > Netherlands > Limburg > Maastricht (0.05)
- North America > United States > Indiana > Lake County > Munster (0.04)
- Europe > Germany > Hamburg (0.04)
Toward a digital twin of U.S. Congress
Helm, Hayden, Chen, Tianyi, McGuinness, Harvey, Lee, Paige, Duderstadt, Brandon, Priebe, Carey E.
In this paper we provide evidence that a virtual model of U.S. congresspersons based on a collection of language models satisfies the definition of a digital twin. In particular, we introduce and provide high-level descriptions of a daily-updated dataset that contains every Tweet from every U.S. congressperson during their respective terms. We demonstrate that a modern language model equipped with congressperson-specific subsets of this data are capable of producing Tweets that are largely indistinguishable from actual Tweets posted by their physical counterparts. We illustrate how generated Tweets can be used to predict roll-call vote behaviors and to quantify the likelihood of congresspersons crossing party lines, thereby assisting stakeholders in allocating resources and potentially impacting real-world legislative dynamics. We conclude with a discussion of the limitations and important extensions of our analysis.
- Asia > China (0.05)
- North America > United States > Indiana > Lake County > Griffith (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- (3 more...)
SelfElicit: Your Language Model Secretly Knows Where is the Relevant Evidence
Liu, Zhining, Amjad, Rana Ali, Adkathimar, Ravinarayana, Wei, Tianxin, Tong, Hanghang
Providing Language Models (LMs) with relevant evidence in the context (either via retrieval or user-provided) can significantly improve their ability to provide factually correct grounded responses. However, recent studies have found that LMs often struggle to fully comprehend and utilize key evidence from the context, especially when it contains noise and irrelevant information - an issue common in real-world scenarios. To address this, we propose SelfElicit, an inference-time approach that helps LMs focus on key contextual evidence through self-guided explicit highlighting. By leveraging the inherent evidence-finding capabilities of LMs using the attention scores of deeper layers, our method automatically identifies and emphasizes key evidence within the input context, facilitating more accurate and factually grounded responses without additional training or iterative prompting. We demonstrate that SelfElicit brings consistent and significant improvement on multiple evidence-based QA tasks for various LM families while maintaining computational efficiency. Our code and documentation are available at https://github.com/ZhiningLiu1998/SelfElicit.
- Oceania > Australia > South Australia (0.15)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > Austria > Vienna (0.14)
- (29 more...)
- Leisure & Entertainment > Sports > Soccer (1.00)
- Media > Music (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Model Successor Functions
Chang, Yingshan, Bisk, Yonatan
The notion of generalization has moved away from the classical one defined in statistical learning theory towards an emphasis on out-of-domain generalization (OODG). Recently, there is a growing focus on inductive generalization, where a progression of difficulty implicitly governs the direction of domain shifts. In inductive generalization, it is often assumed that the training data lie in the easier side, while the testing data lie in the harder side. The challenge is that training data are always finite, but a learner is expected to infer an inductive principle that could be applied in an unbounded manner. This emerging regime has appeared in the literature under different names, such as length/logical/algorithmic extrapolation, but a formal definition is lacking. This work provides such a formalization that centers on the concept of model successors. Then we outline directions to adapt well-established techniques towards the learning of model successors. This work calls for restructuring of the research discussion around inductive generalization from fragmented task-centric communities to a more unified effort, focused on universal properties of learning and computation.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- (12 more...)
- Health & Medicine (0.67)
- Education > Educational Setting (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.86)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
- (2 more...)
Instantiation-based Formalization of Logical Reasoning Tasks using Language Models and Logical Solvers
Raza, Mohammad, Milic-Frayling, Natasa
Robustness of reasoning remains a significant challenge for large language models, and addressing it is essential for the practical applicability of AI-driven reasoning systems. We introduce Semantic Self-Verification (SSV), a novel approach that addresses the key challenge in combining language models with the rigor of logical solvers: to accurately formulate the reasoning problem from natural language to the formal language of the solver. SSV uses a consistency-based approach to produce strong abstract formalizations of problems using concrete instantiations that are generated by the model and verified by the solver. In addition to significantly advancing the overall reasoning accuracy over the state-of-the-art, a key novelty that this approach presents is a feature of verification that has near-perfect precision over a significant coverage of cases, as we demonstrate on open reasoning benchmarks. We propose such *near-certain reasoning* as a new approach to reduce the need for manual verification in many cases, taking us closer to more dependable and autonomous AI reasoning systems.
- Asia > Indonesia > Bali (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > Indiana > Lake County > Gary (0.04)
- (2 more...)